This notebook will guide you to extract tabular data from a pdf file.
The the overall goal is to extract trajectory data from a pdf file and then plot wellbore trajectory of well Hibernia 1638
The following example contains drilling trajaectory data from CNLOPB website.
CNLPOB - Canada-Newfoundland & Labrador Offshore Petroleum Borad
We will use trajectory data from the Well Hibernia 1638. These data availabe on CNLOPB data repository.
Trajectory data available in this link https://home-cnlopb.hub.arcgis.com/pages/hibernia-b-16-38
You can download the required file (INV-146206.pdf) by clicking the view record
## Import Necessary Data File
import tabula
from tabula import read_pdf
from tabulate import tabulate
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
#Load the pdf file to see the content; you can set width and height too
from IPython.display import IFrame
IFrame("INV-146206.pdf", width=800, height=600)
# Load pdf file to python from local machine
dfs = tabula.read_pdf("INV-146206.pdf", pages='all')
# Alternatively you can load pdf file directly from the website
dfs = tabula.read_pdf("https://cnlopb.maps.arcgis.com/sharing/rest/content/items/b8794768693e4cdf819b24635b41a0b7/data", pages='all')
# Check number of Pages; it shows there are 4 pages
print(len(dfs))
4
page1 = dfs[0]
## See the content within page 1
page1
| Unnamed: 0 | MD | Incl | Azim Grid | Azim True | TVD | VSEC | NS | EW | DLS | Northing | Easting | Unnamed: 1 | Latitude | Unnamed: 2 | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Comments | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | NaN | (m) | (°) | (°) | (°) | (m) | (m) | (m) | (m) | (°/30m) | (m) | (m) | NaN | (N/S ° ' ") | NaN | (E/W ° ' ") |
| 2 | Tie-In / Slot 45 | 0.00 | 0.00 | 0.00 | 1.62 | 0.00 | 0.00 | 4.57 | 23.98 | N/A | 5179815.29 | 669440.50 | N | 46 451.37 | W | 48 46 53.57 |
| 3 | Seabed | 156.91 | 0.00 | 0.00 | 1.62 | 156.91 | 0.00 | 4.57 | 23.98 | 0.00 | 5179815.29 | 669440.50 | N | 46 451.37 | W | 48 46 53.57 |
| 4 | 30" Csg Gyro | 165.00 | 0.03 | 324.49 | 326.11 | 165.00 | 0.00 | 4.57 | 23.98 | 0.11 | 5179815.29 | 669440.50 | N | 46 451.37 | W | 48 46 53.57 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 76 | NaN | 1235.21 | 23.46 | 75.46 | 77.08 | 1214.71 | 79.91 | 72.64 | 171.72 | 0.75 | 5179883.36 | 669588.24 | N | 46 453.44 | W | 48 46 46.52 |
| 77 | NaN | 1264.67 | 23.11 | 76.34 | 77.96 | 1241.77 | 87.40 | 75.48 | 183.02 | 0.50 | 5179886.20 | 669599.53 | N | 46 453.52 | W | 48 46 45.99 |
| 78 | NaN | 1294.22 | 25.04 | 78.10 | 79.72 | 1268.75 | 95.37 | 78.14 | 194.77 | 2.09 | 5179888.86 | 669611.29 | N | 46 453.60 | W | 48 46 45.43 |
| 79 | NaN | 1322.80 | 27.41 | 82.23 | 83.85 | 1294.39 | 104.21 | 80.28 | 207.21 | 3.14 | 5179890.99 | 669623.72 | N | 46 453.65 | W | 48 46 44.84 |
| 80 | GRS 1 | 1341.15 | 27.77 | 81.13 | 82.75 | 1310.65 | 110.30 | 81.51 | 215.62 | 1.02 | 5179892.22 | 669632.13 | N | 46 453.69 | W | 48 46 44.44 |
81 rows × 16 columns
page2 = dfs[1]
page3 = dfs[2]
page4 = dfs[3]
# See content in page 4
page4
| (m) | (m).1 | (m).2 | Unnamed: 0 | (mm) | Unnamed: 1 | Unnamed: 2 | |
|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | NaN | NaN | (mm) | NaN |
| 1 | 0.0 | 156.0 | NaN | Act Stns | 762.0 | 762.000 SLB_CNSG+CASING-Depth Only | B-16 38 / B-16 38 Definitive |
| 2 | 156.0 | 295.0 | NaN | Act Stns | 762.0 | 762.000 SLB_CNSG+CASING | B-16 38 / B-16 38 Definitive |
| 3 | 295.0 | 1322.8 | NaN | Act Stns | 762.0 | 762.000 SLB_NSG+MSHOT | B-16 38 / B-16 38 Definitive |
| 4 | 1322.8 | 6971.0 | NaN | Act Stns | 762.0 | 762.000 SLB_MWD+GMAG | B-16 38 / B-16 38 Definitive |
# We want to keep first 3 pages in data frace. Merge (concat) three page together
df = pd.concat([page1,page2,page3], axis=0)
# See the complete dataframe (Top 5 rows and bottom 5 rows)
df
| Unnamed: 0 | MD | Incl | Azim Grid | Azim True | TVD | VSEC | NS | EW | DLS | Northing | Easting | Unnamed: 1 | Latitude | Unnamed: 2 | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Comments | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | NaN | (m) | (°) | (°) | (°) | (m) | (m) | (m) | (m) | (°/30m) | (m) | (m) | NaN | (N/S ° ' ") | NaN | (E/W ° ' ") |
| 2 | Tie-In / Slot 45 | 0.00 | 0.00 | 0.00 | 1.62 | 0.00 | 0.00 | 4.57 | 23.98 | N/A | 5179815.29 | 669440.50 | N | 46 451.37 | W | 48 46 53.57 |
| 3 | Seabed | 156.91 | 0.00 | 0.00 | 1.62 | 156.91 | 0.00 | 4.57 | 23.98 | 0.00 | 5179815.29 | 669440.50 | N | 46 451.37 | W | 48 46 53.57 |
| 4 | 30" Csg Gyro | 165.00 | 0.03 | 324.49 | 326.11 | 165.00 | 0.00 | 4.57 | 23.98 | 0.11 | 5179815.29 | 669440.50 | N | 46 451.37 | W | 48 46 53.57 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 85 | NaN | 6872.24 | 20.32 | 75.85 | 77.47 | 4373.00 | 4187.94 | -2472.59 | 3400.88 | 0.61 | 5177338.25 | 672817.24 | N | 46 43 38.07 | W | 48 44 17.89 |
| 86 | NaN | 6900.79 | 20.32 | 75.86 | 77.48 | 4399.78 | 4194.31 | -2470.17 | 3410.50 | 0.00 | 5177340.67 | 672826.85 | N | 46 43 38.14 | W | 48 44 17.44 |
| 87 | NaN | 6929.17 | 20.51 | 72.83 | 74.45 | 4426.37 | 4200.47 | -2467.49 | 3420.02 | 1.14 | 5177343.34 | 672836.38 | N | 46 43 38.22 | W | 48 44 16.98 |
| 88 | NaN | 6947.51 | 20.03 | 64.69 | 66.31 | 4443.58 | 4203.92 | -2465.20 | 3425.93 | 4.68 | 5177345.63 | 672842.29 | N | 46 43 38.28 | W | 48 44 16.70 |
| 89 | Proj to TD | 6971.00 | 20.00 | 63.90 | 65.52 | 4465.65 | 4207.75 | -2461.71 | 3433.18 | 0.35 | 5177349.12 | 672849.54 | N | 46 43 38.39 | W | 48 44 16.36 |
280 rows × 16 columns
# Dropping unwanted columns in the dataframe
df.drop(['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 2'], axis=1, inplace=True )
# Check again to see the columns
df
| MD | Incl | Azim Grid | Azim True | TVD | VSEC | NS | EW | DLS | Northing | Easting | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | (m) | (°) | (°) | (°) | (m) | (m) | (m) | (m) | (°/30m) | (m) | (m) | (N/S ° ' ") | (E/W ° ' ") |
| 2 | 0.00 | 0.00 | 0.00 | 1.62 | 0.00 | 0.00 | 4.57 | 23.98 | N/A | 5179815.29 | 669440.50 | 46 451.37 | 48 46 53.57 |
| 3 | 156.91 | 0.00 | 0.00 | 1.62 | 156.91 | 0.00 | 4.57 | 23.98 | 0.00 | 5179815.29 | 669440.50 | 46 451.37 | 48 46 53.57 |
| 4 | 165.00 | 0.03 | 324.49 | 326.11 | 165.00 | 0.00 | 4.57 | 23.98 | 0.11 | 5179815.29 | 669440.50 | 46 451.37 | 48 46 53.57 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 85 | 6872.24 | 20.32 | 75.85 | 77.47 | 4373.00 | 4187.94 | -2472.59 | 3400.88 | 0.61 | 5177338.25 | 672817.24 | 46 43 38.07 | 48 44 17.89 |
| 86 | 6900.79 | 20.32 | 75.86 | 77.48 | 4399.78 | 4194.31 | -2470.17 | 3410.50 | 0.00 | 5177340.67 | 672826.85 | 46 43 38.14 | 48 44 17.44 |
| 87 | 6929.17 | 20.51 | 72.83 | 74.45 | 4426.37 | 4200.47 | -2467.49 | 3420.02 | 1.14 | 5177343.34 | 672836.38 | 46 43 38.22 | 48 44 16.98 |
| 88 | 6947.51 | 20.03 | 64.69 | 66.31 | 4443.58 | 4203.92 | -2465.20 | 3425.93 | 4.68 | 5177345.63 | 672842.29 | 46 43 38.28 | 48 44 16.70 |
| 89 | 6971.00 | 20.00 | 63.90 | 65.52 | 4465.65 | 4207.75 | -2461.71 | 3433.18 | 0.35 | 5177349.12 | 672849.54 | 46 43 38.39 | 48 44 16.36 |
280 rows × 13 columns
# drop raw by index number. We want to drop first two rows. We will drop them by row index
df.drop([0,1], inplace= True)
# Also, we will reset indexing to the dataframe
df.reset_index(inplace=True, drop = True)
df
| MD | Incl | Azim Grid | Azim True | TVD | VSEC | NS | EW | DLS | Northing | Easting | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00 | 0.00 | 0.00 | 1.62 | 0.00 | 0.00 | 4.57 | 23.98 | N/A | 5179815.29 | 669440.50 | 46 451.37 | 48 46 53.57 |
| 1 | 156.91 | 0.00 | 0.00 | 1.62 | 156.91 | 0.00 | 4.57 | 23.98 | 0.00 | 5179815.29 | 669440.50 | 46 451.37 | 48 46 53.57 |
| 2 | 165.00 | 0.03 | 324.49 | 326.11 | 165.00 | 0.00 | 4.57 | 23.98 | 0.11 | 5179815.29 | 669440.50 | 46 451.37 | 48 46 53.57 |
| 3 | 170.00 | 0.14 | 217.05 | 218.67 | 170.00 | 0.00 | 4.57 | 23.98 | 0.91 | 5179815.29 | 669440.49 | 46 451.37 | 48 46 53.57 |
| 4 | 175.00 | 0.19 | 256.89 | 258.51 | 175.00 | -0.01 | 4.56 | 23.96 | 0.73 | 5179815.28 | 669440.48 | 46 451.37 | 48 46 53.57 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 269 | 6872.24 | 20.32 | 75.85 | 77.47 | 4373.00 | 4187.94 | -2472.59 | 3400.88 | 0.61 | 5177338.25 | 672817.24 | 46 43 38.07 | 48 44 17.89 |
| 270 | 6900.79 | 20.32 | 75.86 | 77.48 | 4399.78 | 4194.31 | -2470.17 | 3410.50 | 0.00 | 5177340.67 | 672826.85 | 46 43 38.14 | 48 44 17.44 |
| 271 | 6929.17 | 20.51 | 72.83 | 74.45 | 4426.37 | 4200.47 | -2467.49 | 3420.02 | 1.14 | 5177343.34 | 672836.38 | 46 43 38.22 | 48 44 16.98 |
| 272 | 6947.51 | 20.03 | 64.69 | 66.31 | 4443.58 | 4203.92 | -2465.20 | 3425.93 | 4.68 | 5177345.63 | 672842.29 | 46 43 38.28 | 48 44 16.70 |
| 273 | 6971.00 | 20.00 | 63.90 | 65.52 | 4465.65 | 4207.75 | -2461.71 | 3433.18 | 0.35 | 5177349.12 | 672849.54 | 46 43 38.39 | 48 44 16.36 |
274 rows × 13 columns
# In this steps, we want to make a copy of the data frame as a back up.
df_copy= df.copy()
# Check data types for each columns. ALso it shows total number of non-null data for each column
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 274 entries, 0 to 273 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 MD 274 non-null object 1 Incl 274 non-null object 2 Azim Grid 274 non-null object 3 Azim True 274 non-null object 4 TVD 274 non-null object 5 VSEC 274 non-null object 6 NS 274 non-null object 7 EW 274 non-null object 8 DLS 274 non-null object 9 Northing 274 non-null object 10 Easting 274 non-null object 11 Latitude 274 non-null object 12 Longitude 274 non-null object dtypes: object(13) memory usage: 28.0+ KB
# It shows that all data stored as 'object' type. In next steps, we will convert data to numeric data.
cols = df.columns[df.dtypes.eq('object')]
# check all columns in
cols
Index(['MD', 'Incl', 'Azim Grid', 'Azim True', 'TVD', 'VSEC', 'NS', 'EW',
'DLS', 'Northing', 'Easting', 'Latitude', 'Longitude'],
dtype='object')
# Convert data to numeric; errors =‘coerce’ will set invalid parsing set as NaN.
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)
# Let us check the data type again. Now it shows that there is no data for Latitude and Longitude
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 274 entries, 0 to 273 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 MD 274 non-null float64 1 Incl 274 non-null float64 2 Azim Grid 274 non-null float64 3 Azim True 274 non-null float64 4 TVD 274 non-null float64 5 VSEC 274 non-null float64 6 NS 274 non-null float64 7 EW 274 non-null float64 8 DLS 273 non-null float64 9 Northing 274 non-null float64 10 Easting 274 non-null float64 11 Latitude 0 non-null float64 12 Longitude 0 non-null float64 dtypes: float64(13) memory usage: 28.0 KB
Since we made change in this dataframe, we could not made any further modification on in 'Latitude' and 'Longitude'. Therefore, we can get them again from back up data.
Let us check the back up data again. It seems that there are unwanted space between number in 'Latitude' and 'Longitude'
df_copy
| MD | Incl | Azim Grid | Azim True | TVD | VSEC | NS | EW | DLS | Northing | Easting | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00 | 0.00 | 0.00 | 1.62 | 0.00 | 0.00 | 4.57 | 23.98 | N/A | 5179815.29 | 669440.50 | 46 451.37 | 48 46 53.57 |
| 1 | 156.91 | 0.00 | 0.00 | 1.62 | 156.91 | 0.00 | 4.57 | 23.98 | 0.00 | 5179815.29 | 669440.50 | 46 451.37 | 48 46 53.57 |
| 2 | 165.00 | 0.03 | 324.49 | 326.11 | 165.00 | 0.00 | 4.57 | 23.98 | 0.11 | 5179815.29 | 669440.50 | 46 451.37 | 48 46 53.57 |
| 3 | 170.00 | 0.14 | 217.05 | 218.67 | 170.00 | 0.00 | 4.57 | 23.98 | 0.91 | 5179815.29 | 669440.49 | 46 451.37 | 48 46 53.57 |
| 4 | 175.00 | 0.19 | 256.89 | 258.51 | 175.00 | -0.01 | 4.56 | 23.96 | 0.73 | 5179815.28 | 669440.48 | 46 451.37 | 48 46 53.57 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 269 | 6872.24 | 20.32 | 75.85 | 77.47 | 4373.00 | 4187.94 | -2472.59 | 3400.88 | 0.61 | 5177338.25 | 672817.24 | 46 43 38.07 | 48 44 17.89 |
| 270 | 6900.79 | 20.32 | 75.86 | 77.48 | 4399.78 | 4194.31 | -2470.17 | 3410.50 | 0.00 | 5177340.67 | 672826.85 | 46 43 38.14 | 48 44 17.44 |
| 271 | 6929.17 | 20.51 | 72.83 | 74.45 | 4426.37 | 4200.47 | -2467.49 | 3420.02 | 1.14 | 5177343.34 | 672836.38 | 46 43 38.22 | 48 44 16.98 |
| 272 | 6947.51 | 20.03 | 64.69 | 66.31 | 4443.58 | 4203.92 | -2465.20 | 3425.93 | 4.68 | 5177345.63 | 672842.29 | 46 43 38.28 | 48 44 16.70 |
| 273 | 6971.00 | 20.00 | 63.90 | 65.52 | 4465.65 | 4207.75 | -2461.71 | 3433.18 | 0.35 | 5177349.12 | 672849.54 | 46 43 38.39 | 48 44 16.36 |
274 rows × 13 columns
# remove space with .str.replace(' ', '')
df_copy['Latitude']= df_copy['Latitude'].str.replace(' ', '')
df_copy['Longitude']=df_copy['Longitude'].str.replace(' ', '')
# See the change in Latitude' and 'Longitude'
df_copy
| MD | Incl | Azim Grid | Azim True | TVD | VSEC | NS | EW | DLS | Northing | Easting | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00 | 0.00 | 0.00 | 1.62 | 0.00 | 0.00 | 4.57 | 23.98 | N/A | 5179815.29 | 669440.50 | 46451.37 | 484653.57 |
| 1 | 156.91 | 0.00 | 0.00 | 1.62 | 156.91 | 0.00 | 4.57 | 23.98 | 0.00 | 5179815.29 | 669440.50 | 46451.37 | 484653.57 |
| 2 | 165.00 | 0.03 | 324.49 | 326.11 | 165.00 | 0.00 | 4.57 | 23.98 | 0.11 | 5179815.29 | 669440.50 | 46451.37 | 484653.57 |
| 3 | 170.00 | 0.14 | 217.05 | 218.67 | 170.00 | 0.00 | 4.57 | 23.98 | 0.91 | 5179815.29 | 669440.49 | 46451.37 | 484653.57 |
| 4 | 175.00 | 0.19 | 256.89 | 258.51 | 175.00 | -0.01 | 4.56 | 23.96 | 0.73 | 5179815.28 | 669440.48 | 46451.37 | 484653.57 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 269 | 6872.24 | 20.32 | 75.85 | 77.47 | 4373.00 | 4187.94 | -2472.59 | 3400.88 | 0.61 | 5177338.25 | 672817.24 | 464338.07 | 484417.89 |
| 270 | 6900.79 | 20.32 | 75.86 | 77.48 | 4399.78 | 4194.31 | -2470.17 | 3410.50 | 0.00 | 5177340.67 | 672826.85 | 464338.14 | 484417.44 |
| 271 | 6929.17 | 20.51 | 72.83 | 74.45 | 4426.37 | 4200.47 | -2467.49 | 3420.02 | 1.14 | 5177343.34 | 672836.38 | 464338.22 | 484416.98 |
| 272 | 6947.51 | 20.03 | 64.69 | 66.31 | 4443.58 | 4203.92 | -2465.20 | 3425.93 | 4.68 | 5177345.63 | 672842.29 | 464338.28 | 484416.70 |
| 273 | 6971.00 | 20.00 | 63.90 | 65.52 | 4465.65 | 4207.75 | -2461.71 | 3433.18 | 0.35 | 5177349.12 | 672849.54 | 464338.39 | 484416.36 |
274 rows × 13 columns
# Check the data types again, Now we have 274 data points in all columns
df_copy.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 274 entries, 0 to 273 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 MD 274 non-null object 1 Incl 274 non-null object 2 Azim Grid 274 non-null object 3 Azim True 274 non-null object 4 TVD 274 non-null object 5 VSEC 274 non-null object 6 NS 274 non-null object 7 EW 274 non-null object 8 DLS 274 non-null object 9 Northing 274 non-null object 10 Easting 274 non-null object 11 Latitude 274 non-null object 12 Longitude 274 non-null object dtypes: object(13) memory usage: 28.0+ KB
# Since we have
cols = df_copy.columns[df_copy.dtypes.eq('object')]
df_copy[cols] = df_copy[cols].apply(pd.to_numeric, errors='coerce', axis=1)
# Check the data types again, Now we have 274 data points in all columns
df_copy.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 274 entries, 0 to 273 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 MD 274 non-null float64 1 Incl 274 non-null float64 2 Azim Grid 274 non-null float64 3 Azim True 274 non-null float64 4 TVD 274 non-null float64 5 VSEC 274 non-null float64 6 NS 274 non-null float64 7 EW 274 non-null float64 8 DLS 273 non-null float64 9 Northing 274 non-null float64 10 Easting 274 non-null float64 11 Latitude 274 non-null float64 12 Longitude 274 non-null float64 dtypes: float64(13) memory usage: 28.0 KB
df_copy
| MD | Incl | Azim Grid | Azim True | TVD | VSEC | NS | EW | DLS | Northing | Easting | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00 | 0.00 | 0.00 | 1.62 | 0.00 | 0.00 | 4.57 | 23.98 | NaN | 5179815.29 | 669440.50 | 46451.37 | 484653.57 |
| 1 | 156.91 | 0.00 | 0.00 | 1.62 | 156.91 | 0.00 | 4.57 | 23.98 | 0.00 | 5179815.29 | 669440.50 | 46451.37 | 484653.57 |
| 2 | 165.00 | 0.03 | 324.49 | 326.11 | 165.00 | 0.00 | 4.57 | 23.98 | 0.11 | 5179815.29 | 669440.50 | 46451.37 | 484653.57 |
| 3 | 170.00 | 0.14 | 217.05 | 218.67 | 170.00 | 0.00 | 4.57 | 23.98 | 0.91 | 5179815.29 | 669440.49 | 46451.37 | 484653.57 |
| 4 | 175.00 | 0.19 | 256.89 | 258.51 | 175.00 | -0.01 | 4.56 | 23.96 | 0.73 | 5179815.28 | 669440.48 | 46451.37 | 484653.57 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 269 | 6872.24 | 20.32 | 75.85 | 77.47 | 4373.00 | 4187.94 | -2472.59 | 3400.88 | 0.61 | 5177338.25 | 672817.24 | 464338.07 | 484417.89 |
| 270 | 6900.79 | 20.32 | 75.86 | 77.48 | 4399.78 | 4194.31 | -2470.17 | 3410.50 | 0.00 | 5177340.67 | 672826.85 | 464338.14 | 484417.44 |
| 271 | 6929.17 | 20.51 | 72.83 | 74.45 | 4426.37 | 4200.47 | -2467.49 | 3420.02 | 1.14 | 5177343.34 | 672836.38 | 464338.22 | 484416.98 |
| 272 | 6947.51 | 20.03 | 64.69 | 66.31 | 4443.58 | 4203.92 | -2465.20 | 3425.93 | 4.68 | 5177345.63 | 672842.29 | 464338.28 | 484416.70 |
| 273 | 6971.00 | 20.00 | 63.90 | 65.52 | 4465.65 | 4207.75 | -2461.71 | 3433.18 | 0.35 | 5177349.12 | 672849.54 | 464338.39 | 484416.36 |
274 rows × 13 columns
# We can see the data distribution for all columns
df_copy.describe()
| MD | Incl | Azim Grid | Azim True | TVD | VSEC | NS | EW | DLS | Northing | Easting | Latitude | Longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 274.000000 | 274.000000 | 274.000000 | 274.000000 | 274.000000 | 274.000000 | 274.000000 | 274.000000 | 273.000000 | 2.740000e+02 | 274.000000 | 274.000000 | 274.000000 |
| mean | 3141.013358 | 39.768248 | 120.544489 | 122.164489 | 2084.979781 | 1712.290401 | -961.175146 | 1438.670182 | 1.002271 | 5.178850e+06 | 670855.122482 | 285927.164599 | 449541.350328 |
| std | 2138.744004 | 27.682906 | 43.394699 | 43.394699 | 1184.114039 | 1559.378382 | 988.242623 | 1214.209008 | 0.861026 | 9.881963e+02 | 1214.151893 | 207109.131848 | 118723.774567 |
| min | 0.000000 | 0.000000 | 0.000000 | 1.620000 | 0.000000 | -0.020000 | -2485.290000 | 23.810000 | 0.000000 | 5.177326e+06 | 669440.320000 | 46440.010000 | 48450.550000 |
| 25% | 1065.640000 | 17.212500 | 93.040000 | 94.660000 | 1056.260000 | 44.577500 | -1984.965000 | 114.612500 | 0.350000 | 5.177826e+06 | 669531.125000 | 46451.835000 | 484437.357500 |
| 50% | 3048.500000 | 37.800000 | 132.025000 | 133.645000 | 2186.265000 | 1455.110000 | -702.695000 | 1308.130000 | 0.660000 | 5.179108e+06 | 670724.585000 | 464338.515000 | 484544.525000 |
| 75% | 4995.142500 | 69.675000 | 134.977500 | 136.597500 | 2860.665000 | 3259.542500 | 4.570000 | 2607.835000 | 1.500000 | 5.179815e+06 | 672024.235000 | 464416.757500 | 484649.237500 |
| max | 6971.000000 | 71.730000 | 324.490000 | 326.110000 | 4465.650000 | 4207.750000 | 86.640000 | 3433.180000 | 4.680000 | 5.179897e+06 | 672849.540000 | 464459.530000 | 484653.580000 |
Now we will plot the trajectory data in a 3D plot. This will generate a 3D interactive trajectory plot.
import plotly.express as px
fig =px.line_3d(df_copy, x='NS', y='EW',z='TVD',
title=" Hibernia 1638 Trajectory Profile",
width=800,height=800)
fig.show()
It shows an inverse profile for TVD, therefore we will change the axis direction for TVD by change the range for Z axis
import plotly.express as px
fig =px.line_3d(df_copy, x='NS', y='EW',z='TVD',range_z=[4500,0],
title=" Hibernia 1638 Trajectory Profile",
width=800,height=800)
fig.show()
# We can save the inter-active plot in a html file
fig.write_html("1638 Wellbore Trajectory.html")
# save data to a csv file
df_copy.to_csv('Hibernia 1638 Trajectory data.csv')
#### Thank you
#### Email your questions/ comments/suggesstions: mmh710@mun.ca
##### https://github.com/mojammelhuque/Drilling-Data-Analytics